Back to Part 6
Note: The science behind sequence
matching
Analysis of a newly isolated DNA molecule by comparing its sequence with
all other known sequences in search of a match involves database searching
and sequence alignment. The ultimate goal of this type of analysis is to
determine whether the new sequence bears a significant degree of similarity
(or homology) to another known sequence.
Perhaps the most important tool necessary for sequence matching is access
to a comprehensive and up-to-date sequence database. GenBank, the EMBL nucleotide
sequence database, and the DNA Database of Japan (DDBJ) are three partners
in a longstanding collaboration to collect all publicly available sequence
data. Sites in Bethesda, Maryland (USA), Hinxton (UK), and Mishima (Japan)
exchange new sequence data and updates over the Internet everyday and make
the information immediately available to everyone by e-mail, anonymous ftp,
and the World Wide Web.
Next in importance is the computer program used to search the database.
Several different mathematical programs allow two sequences to be compared
with each other and determine the degree of similarity between them. The
BLAST programs (BLAST is an acronym for Basic Local Alignment Search Tool)
are among the most popular programs. They offer a good combination of speed,
sensitivity, flexibility, and statistical rigorousness. (See interpreting
BLAST search results.)
In what situations would a scientist search sequence databases? As an example,
sequence matching can be used to determine whether a newly identified DNA
sequence is part of a known gene. In the simplest scenario, if a new sequence
is identical or almost identical (except for a few nucleotide changes) to
that of a gene in the sequence database, it is reasonable to conclude that
the new sequence is either part of the same gene or of a closely related
gene. But what if two sequences which appear to be different share sections
that are identical? How do you know whether the identical sections are due
to chance or indicate some meaningful relationship between the two sequences?
Sequence analysis using BLAST or another program provides a "similarity
score" to help answer this question.
If the function of a particular DNA sequence is already known–for example,
the 16S rRNA gene we have been working with in this lab–comparing its
sequence with that of the same gene from another species of bacteria provides
information about the evolutionary relationship between the two bacterial
species. The assumption here is that the number of positions that differ
in the nucleotide sequence is proportional to the time elapsed since the
two species formed their own lines of descent from a common predecessor.
However, not all DNA sequences change at a constant rate over time. For
example, it is not at all clear whether all organisms experience similar
mutation rates from purely environmental factors (from increased UV exposure,
for example). If the DNA sequence has or has had at some point in evolution
a functional role, the rate of evolution and selection—which may be
related to population size among other things—can affect its rate of
change. And, in some cases, mutations are caused by deletions, insertions,
and substitutions of long sequences of DNA rather than by single nucleotide
changes. Finally, some sequences of DNA encode proteins with very specific
structural requirements, and any change may prove unfavorable to the organism.
Such sequences therefore do not tolerate change well and tend to remain
the same for long periods of time. These are referred to as "conserved"
regions. In contrast, sequences that can accommodate change more easily
are referred to as "variable" regions.
|